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Identification of Matrices having a Sparse Representation 


Gotz E. Pfander * Holger Rauhut | Jared Tanner ^ 

We consider the problem of recovering a matrix from its action on a known vector in 
the setting where the matrix can he represented efficiently in a known matrix dictionary. 
Connections with sparse signal recovery allows for the use of efficient reconstruction tech¬ 
niques such as Basis Pursuit. Of particular interest is the dictionary of time-frequency 
shift matrices and its role for channel estimation and identification in communications 
engineering. We present recovery results for Basis Pursuit with the time-frequency shift 
dictionary and various dictionaries of random matrices. 

1. INTRODUCTION 

Inferring reliable information from limited data is a key task in the sciences. For example, identifying 
a channel operator from its response to a limited number of test signals is a crucial step in radar and 
communications engineering [251 |32l [Ml sa US 119]. Here we consider the canonical setting where an 
operator is approximated by a linear map, that is, by a matrix T G While it is clear that F is 

determined by its action on any m vectors that span C™, significantly fewer measurements may be sufficient 
if a-priori information about the operator is at hand. For instance, one commonly considers the question 
whether a single test signal h, referred to also as identifier, can be used to identify F from F^. A priori 
information guaranteeing that such an h exists is generally deduced from physical considerations which 
may ensure that F can be efficiently represented or approximated using relatively few basic matrices from 
a known matrix dictionary. 

In wireless communications (iniEHiiii] and references within) and sonar |39[ I5n| . for example, the 

narrowband regime of a transmission channel can generally be well approximated by a linear combination 
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of a small number of time-frequency shift matrices. Signals travel from the source to the receiver along a 
number of different paths, each of which can be modeled by a time shift (delay dependent on the length of the 
path traveled) and a frequency shift (Doppler effect caused by the motion of the transmitter, of the receiver, 
and of reflecting objects) [5l|28]. It is frequently assumed, that the number of relevant (but unknown) paths, 
that is, in slightly simplified terms the number of involved time-frequency shifts is relatively small when 
compared to the symbol length. For example, for mobile communications the number of paths required to 
well approximate a channel in rural areas or typical urban regiments does not exceed 10 |41l pages 266,283], 
see also [13 US]. In wireless communications the benefit of recovering the operator at the receiver is clear. 
Knowledge of the operator is necessary to invert it and to recover the information carrying channel input 
from the channel output. 


Complexity regularization has recently seen a resurgence of interest in the signal processing community 
under the monikers sparse signal recovery and sparse approximation. In sparse signal recovery, one seeks the 
solution of an underdetermined system of equations Ax = b, A ^ n < N, with x having the fewest 

number of non-zero entries from all solutions of Ax = b. We show in Section [3 that the identification of a 
matrix from its action on a single test signal falls into the same setting as sparse signal recovery when the 
matrix is known to have a sparse representation. This observation allows us to adopt efficient algorithms 
from sparse signal recovery for the sparse matrix identification question. Examples of applications include 
the channel identification, estimation, or sounding problem described in part above, which also have been 
considered in the case of time-invariant channels in mi 33133. Numerical results based on Basis Pursuit 
have been obtained for time-varying channels in [IH]- Further, the application of recovery methods of sparsely 
represented operators to radar measurements is discussed in jSSj- 


In brief, the content of this paper is organized as follows. In Sectionwe formalize the matrix identifica¬ 
tion problem for matrices with sparse representations. We establish a connection to the recovery problem of 
vectors with sparse representations and state the main results that are proven and discussed in greater detail 
in Section and Section In particular, we consider matrix ensembles of random Gaussian or Bernoulli 
matrices as well as partial Fourier matrices (Section 2.1 and Section Q. 

In Section 2.2 and Section we consider matrix dictionaries of time-frequency shift matrices which are 
of particular interest due to their efficacy in approximating time-varying transmission channels. We would 
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like to emphasize that the common framework of the identification problem for matrices with a sparse 
representation and the sparse signal recovery problem implies that the results achieved on the recovery of 
matrices with a sparse representation in the dictionary of time-frequency shift matrices are at the same time 
results for the recovery of signals with a sparse representation in Gabor frames. 

In Section we briefly discuss the use of several test vectors instead of just one, and comment on how 
this improves corresponding recovery results. 


We conclude with numerical experiments in Section [7j They verify our main results concerning sparse 


representations with time-frequency shift matrices stated in Theorem 2.5 and show that the precise recov¬ 


erability thresholds follow those proven for Gaussian random matrices in |24) : that is, for matrices having 
a /c-sparse representation we observe Basis Pursuit to successfully recover the matrix from its action on a 
single vector provided k < n/(21ogn). 


2. MAIN RESULTS AND CONTEXT 


Before comparing the matrix identification problem with sparse signal recovery, we formalize the notion of 
a matrix having a /c-sparse representation. 

Definition 2.1. A matrix T has a k-sparse representation in the matrix dictionary 'J' = if 


r = with ||®||o = k, 


and ||»||o counts the number of non-zero entries in x, that is ||a;||o = |suppa;| = cardinalityjxj : Xj ^ 0}. 
The set of elementary matrices comprising may form a basis for but it may as well only span a 


subspace of and/or contain linearly dependent subsets. In Definition 2.1 we place no restrictions on 

the dictionary 


Identification of matrices having a sparse representation from their action on a single vector (henceforth 
referred to simply as sparse matrix identification, which is not to be confused with the notion of sparse 
matrices in numerical analysis) can be formulated as sparse signal recovery problem through the simple 

observation that the action of T on a test signal h G can be expressed as 

N N 

Th = = ... \'^Nh)x={'<ifh)x (1) 

i=i i=i 


3 




where x = (xi, X 2 , • • • , xnY' and {^h) = {^ih \ ^ 2 ^ | • • • | ^Nh). 

In classical sparse signal recovery the sparsest vector x satisfying Ax = b is sought given b and A; to 
identify the matrix T, F/i takes the place of b and the column of A is ^jh for j = 1,2,..., N. 

As mentioned above, we note that in case of the being time-frequency shift matrices, the columns 
in A = form a Gabor system with window h |12l I29L [37] , Consequently, all our identifiability results 
concerning representations with time-frequency shift matrices are also results for the recovery of signals that 
are sparse in a Gabor system. 

Remark 2.2. Although sparse matrix identification can be cast as sparse signal recovery, two important 
differences should be noted. 


• In most applications, sparse signal recovery is only of interest for /c-sparse vectors with k < n, as the 
linear dependence of the N columns of A G n < N, implies that n-term solutions x for Ax = b 

are never unique. However, in some cases an n-term solution might be of interest if there is no sparser 
solution of Aa; = b. In contrast, the goal in sparse matrix identification is not to represent b = Th 
efficiently, but to recover T. The non-uniqueness of n-term solutions to {^h)x = Th implies that 
there always exist infinitely many n—sparse matrices F' consistent with the observations T'h = Th. 
As such, the recovery of an n-sparse x in the sparse matrix identification setting does not give any 
information about the matrix to be identified, F. 


In sparse signal recovery the columns of A are used to represent or to approximate b, whereas for 
sparse matrix identification the matrices are used to represent or approximate F. However, unlike 
sparse signal recovery where the columns of A appear explicitly in the reconstruction, the do not 
appear explicitly when sparse matrix identification is cast as sparse signal recovery Q; rather, only the 
action of on the test vector h is utilized. The test vector h G C™ has no analog in traditional sparse 
signal recovery, and can be exploited in sparse matrix identification to design desirable characteristics 
in ^jh. This design freedom is utilized extensively in our main results concerning the matrix dictionary 


of time-frequency shifts, Theorem 2.5 


Note that the computational difficulty in sparse signal recovery, sparse approximation, and our formu¬ 
lation of sparse matrix identification arises from the fact that the support set of the non-zero entries in x is 
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unknown. While the direct solution of finding the sparsest representation of F in the dictionary ^ 


min||a;'||o subject to {^h)x' = Vh, 


( 2 ) 


involves a combinatorial search of the support set and is therefore computationally intractable, a number 
of computationally efficient algorithms have been shown to recover the sparsest solution if appropriate 
conditions are met. We concentrate here on recoverability conditions for the canonical sparse signal recovery 
algorithm Basis Pursuit (BP) where the convex problem 

min||a;'||i subject to {^h)x' = Vh, (3) 


||®||i = Ylj is solved as a proxy to Q. 

The convex program ^ can be solved efficiently using well established optimization algorithms for 
second-order cone programming and linear programming for complex and real valued systems, 

respectively. We give theoretical and numerical evidence for conditions where the solution of Q coincides 
exactly with that of ([^. Many other algorithms may also be used as proxys for Q, including Orthogonal 
Matching Pursuit (OMP) stagewise orthogonal matching pursuit (StOMP) [TB] . and an algorithm 


based upon error correcting codes [2j-to name a few. Our principal technical results in Section 5.1 also give 
results for OMP, but for conciseness we do not state them here, leaving them to the interested reader. 


In practice, the measured vector Th will be contaminated by noise, and, in addition, the operator F will 
not be strictly sparse, but will instead be well approximated by a sparse representation; in this case the 
minimization problem ^ will be replaced by its well known variant 


min||a;'||i subject to \\{'ifh)x' — Th\\ 2 <e, (4) 


where ||jz ||2 = \zj\‘^ as usual. 

2.1. Dictionaries of random matrices 


Many known results in sparse signal recovery, sparse approximations and their companion theory of com¬ 
pressed sensing involve random matrices misiisiiMiiis]. Based on these results, we obtain recovery results 
for matrix dictionaries where all its member matrices are chosen at random. From a practical point of 
view such random matrix dictionaries do not seem to be useful in the sparse matrix identification setting; 
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nevertheless, the statements give some insight into the sparse matrix identification question as they give 
guidance in what kind of results to seek in the mathematical analysis of structured and more application 
relevant matrix dictionaries. 

Theorem 2.3. Let h be a non-zero vector in M™. 


(a) Let all entries of the N matrices T'j G j = 1,...,A^ be chosen independently according to a 

standard normal distribution (Gaussian ensemble); or 

(b) let all entries of the N matrices G j = 1,...,A^ be independent Bernoulli ±1 variables 

(Bernoulli ensemble). 


Then there exists a positive constant c so that for e > 0, 


k < c- 


n 


log(i) 

implies that with probability of at least 1 — e all matrices T having a k-sparse representation with respect to 
^ = {T'j} can be recovered from Th by Basis Pursuit (^. 


Using Theorem 3.6, this recovery result can be made stable under perturbation of Th by noise, and also 
applies when T is not exactly /c-sparse, but can be well approximated by a fc-sparse operator. 

Precise information on the constant c will be given in Section]^ In case of the Gaussian ensemble Donoho 
and Tanner dmilTlEnlElCIj obtained sharp thresholds separating regions in the {k/n, n/N) plane where 


recovery holds or fails with high probability; Section 4.1 recounts these and additional results on Gaussian 
systems. Theorem |2.3[ b) is proven in Section 4.2 and similar results for certain diagonal matrices are proven 
in Section iTSl 


2.2. The dictionary of time-frequency shift matrices 

As outlined in the introduction, the matrix dictionary of time-frequency shifts appears naturally in the 
channel identification problem in wireless communications [5] or sonar |50] . Due to physical considerations 
wireless channels may indeed be modeled by sparse linear combinations of time-frequency shifts M^Tp, 
where the periodic translation operators Tp and modulation operator on VP are given by 

[Tph), = h(p+,)^od n, (5) 
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The system of time-frequency shifts, 


Q = {MeTp :i,p = 0 ,... ,n-l}, 


( 6 ) 


forms a basis of and for any non-zero h, the vector dictionary Qh is a Gabor system |29l l35l I37j . 

Below, we focus on the so-called Alltop window 13 El] with entries 


= g = 0,...,n-l, 

and the randomly generated window with entries 


(7) 


= -^eg, g = 0,... ,n-l. 


n 


( 8 ) 


where the eg are independent and uniformly distributed on the torus {zGC,|z| = 1}. 


Invoking existing recovery results [221 EH E2l E3| (see Theorems |3.1| and |3.2| below) and our results on 


the coherence of Gabor systems and Gh^ in Section 5.1 see Section 2.4, we will obtain 


Theorem 2.4. 


(a) Let n be prime and he the Alltop window defined in If k < then Basis Pursuit recovers 

from Th^ all matrices F G having a k-sparse representation, F = |A| = k, 

with respect to the time-frequency shift dictionary G given in 0- 

(b) Let n he even and choose to he the random unimodular window in Let t > 0 and suppose 


k<- 


n 


1 

+ X • 


(9) 


4 Y 2 log n -|- log 4-\-t 2 

Then with probability of at least 1 — Basis Pursuit recovers from Th^ all matrices F G having 

a k-sparse representation with respect to the time-frequency shift dictionary G given in 0. 

A slight variation of part (b) also holds for n odd, but is omitted for conciseness. Further note that 


Theorem 2.4 also holds with Basis Pursuit literally being replaced by Orthogonal Matching Pursuit |52j . 


Moreover, Theorem 3.2 shows that recovery is stable under perturbation of and Vh^ by noise. 


In contrast with Theorem 2.3 for random matrices, where k is allowed to be of order 0{n/ \ogn), 
requires k to be of order ^/n or y^n/logn. Substantially larger order thresholds, 0{n/ \ogn) 


Theorem 


2.4 
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for and 0(n/log^(n)) for are also possible to identify a matrix F which is the linear combination of a 
small number of time-frequency shift matrices. However, this larger regime of successful recovery necessitates 
passing from a worst case analysis for sparse F to an average case analysis in the sense that the coefficient 
vector X is chosen at random. Theorem 2.5 will follow from recent work by Tropp, jSTj, and our coherence 


results in Section 5.1, see Section 5.3 


Theorem 2.5. Let k > 3 and let A be chosen uniformly at random among all subsets o/{0,..., n—1}^ 
of cardinality k. Suppose further that ® G C" has support A with random phases (sgn(x£p))(£_p)gA o.re 
independent and uniformly distributed on the torus {z, \z\ = 1}. Let 

F = ^ XipMiTp. 

{i,p)eA 

(a) Let n be prime and choose the Alltop window from Assume that for e > 0 


k < 


n 


81og(2?T,2/e) 


and 


, := .. ..G .. >1 


144 


n J k\og{k/2 + 1) 


Then with probability at least 


Basis Pursuit recovers F from Th^ 


l-(e + (fc/2r) 


(b) Let n be an even number and choose the random window from (^. Assume 


k < 


n 


for some a > 0 and 


s := 


576{a 


32(cr -|- 2) log(n) log(2n2/e) 


2A:Y 


I -^ ( e -^/^/2 - - 

7 -I- 2) V n 


n 


J A;log(fc/2-|-1) 


> 1 


Then with probability at least 


( 10 ) 


( 11 ) 


l-(e + 4n-" + (A:/2)-^) 


Basis Pursuit recovers F from Th^. (A similar result also holds for n odd.) 










In simple terms, Theorem 2.5 states that F can be recovered from Th^ or Vh^ with high prohahility 
1 — e provided that the sparsity of F satisfies k < Ceu/logn in case of and k < C'n/log(n)^ in case of 

h^. 


In Section 5.4 we use a simple argument from time-frequency analysis to obtain 


Corollary 2.6. Theorems 2.J^. 2.5, and 5.1, also hold with the windows and replaced by their 
Fourier transforms and h^, with entries defined as hj = 

3. TOOLS IN SPARSE SIGNAL RECOVERY 

It was shown in 0 that for any test signal h, we have Th = {^h)x where x is the sparse coefficient 
vector of F. This observation links the sparse matrix identification question with sparse signal recovery 
where one seeks the sparsest solution Q to the underdetermined system Ax = b; in the sparse matrix 
identihcation setting {^h) = ^ih \ ... | 'J'Arb.) takes the place of A and Th the place of b. In 

contrast to sparse approximation, where the dictionary A is usually fixed, for sparse matrix identihcation 
we have the additional freedom of designing the test signal h in order for {^h) to have desirable properties. 

Let us shortly recall known results in sparse signal recovery and sparse approximation that we apply 


to the sparse matrix identihcation question. In Section 3.1 we review the notion of coherence (12) and 


its implications for sparse signal recovery and approximation using Basis Pursuit, 0 and (Q, as well 


as Orthogonal Matching Pursuit. In Section 3.2 we review the restricted isometry property, allowing for 
improved recoverability results for Basis Pursuit. 


3.1. Coherence 

The recoverability properties of sparse signal recovery algorithms for an underdetermined system Ax = b 
is often measured by the coherence of A, 

H = max|(ar,as)|, (12) 

r^s 

where is the column of A and ||cir ||2 = 1 for all r. 

Theorem 3.1 (Tropp [S2]; Donoho, Elad [21] )• Let A be a unit norm dictionary with coherence 

T- If 

{2k -l)pL<l 
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then Basis Pursuit (as well as Orthogonal Matching Pursuit) recovers all k-sparse vectors x from b = Ax. 
Recovery is also stable under perturbation by noise when Basis Pursuit Q is replaced with Q. 

Theorem 3.2 (Donoho et al. [22], Theorem 3.1). Let A, p, be as above and suppose that (4A:—1)^ < 
1. Assume that x is k-sparse and we have perturbed observations b = Ax + 2 with \\z\\2 < e. Then the 
solution x"^ of the Basis Pursuit variant 

min||£c^||i subject to ||— 6||2 < <5 


satisfies 



< 


1 — ij{4k — 1 ) 


Theorems 3.1 and 3.2 ensure that the solutions of Q and Q correspond (exactly and approximately, 
respectively) to the solution of (§ for all fc-sparse x. For a broad class of dictionaries the coherence is of 
order 0{l/^/ri), see Sections 4 and 5 for random and Gabor dictionaries, respectively. Hence, Theorems 
and |3.2| ensure (stable) recovery provided k = 0{y/n). 


3.1 


In contrast to these 0{^/n) thresholds, which are valid for all x, Tropp [52| developed a general framework 
for the analysis of Basis Pursuit Q, which is still based on the coherence of a general dictionary, but shows 
that is often successful for substantially larger k than those considered in Theorems |3.1| and |3.2[ This 
comes, however, at the cost of assuming a random model on the sparse signal to be recovered. It allows us 
to prove order 0{n/ logn) for and 0 (n/log(n)^) for recoverability result for the time-frequency-shift 
dictionary. Theorem [2.5[ We state the results of Tropp, where || • || 2,2 denotes the operator norm given by 
II A|| 2,2 = sup|| 3 .|| 2 =i ||A®|| 2 , and Aa is the restriction of a matrix A to the columns indexed by A. 

Theorem 3.3 (Tropp j54j . Theorem 12). Let A be an n x N vector dictionary with unit norm 
columns and coherence pi. Suppose that A is selected uniformly at random among all subsets of {1,... ,N} 
of size k > 3. Let s > 1. Then 


2k. 


^/l44sfJTk^og(kJ¥4^ +—\\A\\2^2 ^ ^ ^^^<5 


( 13 ) 


implies 


IA^Aa - /d||2,2 > <5) < (fc/2)-A 


10 









Theorem 3.4 (Tropp |54| . Theorem 13). Let A be an n x N dictionary with coherence /i. Suppose 
A C {1, ..., A^} of cardinality k A| = k) is such that 

II^A^A “ -^<^112,2 < 1/2. 

Suppose that x G has support A with random phases sgn(xr), r £T, that are independent and uniformly 
distributed on the torus {z, \z\ = 1}. Then with probability at least 1 — the sparse vector x can 

be recovered from b = Ax by Basis Pursuit. 


3.2. Restricted isometry property 

Candes, Romberg and Tao introduced the Restricted Isometry Property (RIP) which is an alternative 
perspective to coherence UM- 

Definition 3.5. Let A G C’T-xAf k < n. The restricted isometry constant 6^ = dk{A) is the smallest 
number such that 

(I - 4)||®||i < ||^®||2 < (1 + <5fc)ll®ll2 


for all k-sparse x. 

A is said to satisfy the restricted isometry property if it has small isometry constants, say 5^ < 1/2; 
such matrices allow stable sparse recovery by Basis Pursuit. 

Theorem 3.6 (Candes, Romberg and Tao [8j). Assume that the restricted isometry constants of A 
satisfy 

^3k + 3(l4fc < 2. 


Let X G and assume we have noisy data y = Ax + rj with \\ r]\\2 < e- Denote by x^ the truncated vector 
corresponding to the k largest absolute values of x. Then the solution £c^ of (^j) satisfies 

\\x* - x \\2 < Cie + C 2 


X — x^\\i 

y/k 


The constants Ci and C 2 depend only on 6^k and d^k- 


Note that for x fc-sparse and noise level e 


0, Theorem 3.6 guarantees exact recovery of a: by Q. 
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4. RANDOM MATRICES 


Many of the recent results in sparse signal recovery with recoverability thresholds for k < Cn/logn either 
assume that A is a random Gaussian or Bernoulli matrix m 0 la iS], or partial random Fourier matrix 
[a i36i sa El HZ]. Recoverability results in these cases can be obtained by establishing the restricted 


isometry property, see Definition 3.5, or through a careful analysis of the geometric structure of the convex 
hull associated with the columns of A dziiiaiiaisaE!. We apply these results to the matrix identification 
problem when the matrix has a sparse representation in terms of certain random matrices. 


4.1. Gaussian matrix ensemble 

Assume all entries of the N matrices G are independent standard Gaussian random variables 

and h is an arbitrary non-zero vector in Then the entries of the dictionary A = {'^h) G whose 

columns are given by ^jh, j = 1,..., N, are jointly independent and of the form Z = 9i^t where the 

gi are independent standard Gaussian random variables. By rotational invariance of the distribution of the 
Gaussian vector (gi,..., gn) the random variable Z has the same distribution as ||h-|| 2 g where 5 is a (scalar¬ 
valued) standard Gaussian. Hence, the dictionary {'^h) has the same distribution as \\h II 2 A G where 

A is a random matrix whose entries are independent standard Gaussians. Thus, the existing literature in 
sparse approximation concerning Gaussian matrices applies, see for instance [HISlliaEllE] and additional 
results discussed in the remainder of this section. 


In particular, the restricted isometry property ensures stable recovery with probability at least 1 — e 
provided 


T) 

Hence, by Theorem 3.6 we have stable recovery by @ in this regime and the statement of Theorem |2.3[ a) 
follows. 


The work of Donoho and Tanner 


actually allows for a stronger statement than (14) in the context 


of noise-free and exact A:-sparse vectors x. A simple version of their results says that most fe-sparse T can 
be recovered with high probability by Basis Pursuit provided k < 2 iog(W/n) • details we refer to 
and for extension to the noisy setting to Wainwright’s work [55]. 
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4.2. Bernoulli matrix ensemble 


The recoverability results for Bernoulli matrices in Theorem 2.3 b) are based on establishing the restricted 


isometry property given in Definition 3.5 


To this end, we assume that the entries of the N matrices G in are selected as independent 

±1 Bernoulli variables, that is, +1 or —1 with equal probability, and let h be an arbitrary non-zero vector. 
Then an entry of the dictionary A = {^h) is given by 


^pq — 


efhe, p = l,...,m, q=l,...,N, 


(15) 


£=1 


where the e™ are independent Bernoulli variables, that is, the apq are independent Rademacher series 


Theorem 4.1 shows that the matrix A has the restricted isometry property with high probability for sparsities 


k that are nearly linear in m. Hence, by Theorem 3.6, for an arbitrary non-zero choice of h we can recover 
any F having a fe-sparse representation in terms of random Bernoulli matrices from the action of F/i through 
Basis Pursuit (§. 

Theorem 4.1. Let h G be normalized by ||/i ||2 = Let A be the random matrix with entries 


defined in (15). Assume 6 G (0,1) and t > 0. If 


n > Ci5 ^{k\og{N/k) -|- log(2e -|- 2Ael5) + t). 


(16) 


Then with probability at least 1 — e * the restricted isometry property is satisfied, that is, for all A C 
{!,..., N} of cardinality at most k it holds that 

(1 - <^)ll®ll2 < Il^®ll2 < (1 + <5)11*112 


for all X supported on A. The constant satisfies Ci < 23.15. 

Proof Let v G be an arbitrary vector. We form the inner product of a row of A with v, 

n N n 

Xp = ~ (x^hiVq. 

q=l q=l 1=1 

By independence of the , the Xp are similarly independent. By Khintchine’s inequality the even moments 
of X can be estimated by the moments of a standard Gaussian variable g [38l 02] 

E[|Xp|2-] < 11^11211^112^ = ||^||2||^||2E[|5|2-], zGN. 
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Following Lemma 5 and the proof of Lemma 6 in [T] this implies the concentration inequality, 


- IIUII 2 I > ell^’lli) < 2exp (-§(e^/2 - e^/3)) . 

By Theorem 2.2 in |46j . see also Theorem 5.2 in [1], this implies that the restricted isometry property holds 
under the stated condition on n. The estimate of the constant Ci follows from |46t Theorem 2.2] as well. □ 


Note that for fixed 6 and t condition (16) can be rewritten as 


k < cn! \og{N/k) 


for some constant c. 


Combining Theorems 3.6 and 4.1 yields Theorem |2.3[b). 


4.3. Diagonal matrices 

Diagonal matrices act as multiplication operators on C”. Using a Fourier expansion of the diagonal, we 
observe that any diagonal matrix can be expressed as linear combination of modulation operators Mu G 
£nxn^ .^ = 0 ,..., n—1, defined in ([^. We now consider the case that only a small number of components of 
the output of a diagonal operator F can be measured; the assumption that F is sparse in the dictionary of 
modulation operators shall be used to recover F from these components. 

To this end, let D be a subset of {0,..., n—1} of cardinality m and denote by Mf' G submatrix 

of Ml with columns and rows restricted to the index set D. Let 


= {M^,i = 0 ,...,n-l} 

and h. = 1 = (1,..., 1)^. If F^ = xiMp then F^l coincides with the restriction of FI = YlZo xiMil 

to the indices in D. 

The matrix A whose columns are the elements of the dictionary (^^1) = {Mpl,£ = 0,... ,n— 1} is 
precisely a row submatrix of the Fourier matrix. 


A = 


/ 2iTirt\ ^ (T^T. 

[6 )r£fl,i=0,...,n—l t ^ 


If the subset D is chosen uniformly at random among all subsets of size m then A^ is a random matrix. 
This random partial Fourier matrix was studied in 0191113, see also [l5] for a slight variation. Indeed, 
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under the condition 


k < c- 


m 


log^(n)log(e“^) 


the restricted isometry property holds with probability at least 1 — e m and by Theorem |3.6| we obtain 
stable recovery of all matrices having a sparse representation in terms of 


5. TIME-FREQUENCY SHIFT DICTIONARIES 

In this section we establish coherence results for the dictionary of time-frequency shift matrices and prove 
Theorems 12.41 and El 


5.1. Coherence for the time-frequency shift dictionary 


We apply known recovery results [221 EH EH Ea [51] for dictionaries with small coherence (|12|). Assuming 


\h \\2 = 1, the coherence, (12), of Gabor systems is 


h = max \{MiTph,Mi>Tp>h)\. 


(17) 


Based on results by Alltop in |3|, Strohmer and Heath showed in m that the coherence 0 of given 
in ([^ satisfies 


h = 


n 


(18) 


for n prime. This is almost optimal since the general lower bound in m for the coherence of frames with 


elements in C"' yields /x > :y=^- 


Unfortunately, the coherence (17) of applies only for n prime. For arbitrary n we consider the random 
window h^. 

Theorem 5.1. Let n G N and choose a random window with entries 


hg = ^e„, g = 0 ,... ,n-l, 


n 


where the Cq are independent and uniformly distributed on the torus {z G C,|z| = 1}. Let pL he the coherence 
of the associated Gabor dictionary then for a > 0 and n even, 


>^)< 4n(n-l)e-“'/^ 
- m 
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while for n odd, 


P(/i > < 2n(n—1) 


e 




(19) 


Up to the constant factor a, the coherence in Theorem 5.1 comes close to the lower bound fi > 


/n+l 


with high probability. Theorems 2.4 and 2.5 will follow from these order 0{l/y/n) coherence results in this 


section and the Theorems 3.1 and 3.2 of [2211271 [521153] and Theorems |3.3| and [3.4 of Tropp |5l] respectively. 


Proof of Theorem 5.1. The technical details for n even and odd are slightly different, for conciseness we 
only state the proof for n even, and outline the proof for n odd. 


A direct computation shows that 




and, therefore, it suffices to consider {M^Tph^, h^), i,p = 0, ...,n—1 ; furthermore, as = 

{Mel, = 0 for ^ ^ 0, we consider only the case p ^ 0. 

Writing Cg = with G [0,1) we obtain 


- n—1 - n—1 

{MeTph^,h^) = - - 

g =0 ^^=0 


- It —i 

R uR\ — ^ g27rj(yq_p-yq+^) 


where eq-p = en+q-p if q — p < 0, that is, the indices are understood modulo n. Set 


^(p/) ^ ^2TTi{yq.p-yq + af) ^ 


and note that is uniformly distributed on the torus T. However, the 5q^’^\ q = 1,... ,n, are no longer 
jointly independent. But nevertheless, as we demonstrate in the following, we can split all variables into 
two subsets of independent variables. 

Ifp = 1, p = n—1, or if neither p nor n—p divide n, then the n/2 random variables eolf, epITf, ■ ■ ■, ep(n/ 2 -i)fpn /2 
are jointly independent, as well as the remaining n/2 variables epn/ 2 fp(n/ 2 +i) > • • • j fp(n-i)U)- The indices are 
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again understood modulo n. If p > 2 or n — p > 2 divides n, then we form the p random vectors 


^1 — (^p(-2pi • ■ ■ 1 ^n—p^o) 7 

^2 —£p+lf^2p+l) • • • ) p+l^l)) 


^p — (f^p—lC2p—1) f2p—l^Sp—1) • ■ • ) fn—If^p—l)■ 


These vectors are jointly independent. Moreover, p < n/2 allows partitioning the entries of a single vector Y 
into two sets and with |Ap|, |Ap| > 1 and the elements of each set are jointly independent. Indeed, this 
can be seen by forming subsets of two adjacent elements of the form efc+(j_|_i)pefc_|_(j+ 2 )p} with 

possibly a remaining single element subset. Then all subsets are jointly independent and the two elements 
inside a subset are independent as well. 

Now by forming unions and U^^^A? we can always partition the index set {0,..., n—1} into two 

subsets Ai, A 2 C {0,... ,n—1} with |Ai| = IA 2 I = n/2 such that the random variables G A*} are 

jointly independent for both i = 1,2. 


In the following, we will use the complex Bernstein inequality, see for example [541 Proposition 15] and 
|42j . It states that for an independent sequence €q,q = 1,... ,n, of random variables which are uniformly 
distributed on the torus. 


E' 

q=l 


>nu\ < 2 e-™'/ 2 _ 


( 20 ) 


Using the pigeonhole principle and the inequality (20) we obtain 


F{\{M,Tph^,h^)\>t) = 


< 

< 


n—l 

P(| > nt) 

<7=0 

P(| > nt/2) + P(| Y ^ ^t/2) 

gSAi (jSA^ 

4exp(—nt^/4). 


Forming the union bound over all possible {p,i) G {0,... ,n—1}^ \ {(0,0)} and choosing t = aj^/n yields 
the statement of Theorem o for n even. 


The proof of Theorem 5.1 for n odd uses essentially the same technique as for n even, with the difference 
that the random variables are grouped into sets of unequal cardinality, |A^| = (n—1)/2 and |A^| = 


(n + l)/2. For large n the probability tail bounds are nearly the same for n even (21) and n odd (19). □ 
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5.2. Proof of Theorem 12.41 


Part (a) follows directly from Theorem 3.1 and the coherence of Qh^ (18). 


Part (b) follows from Theorem 3.1 and Theorem 5.1 In fact, the probability that the condition /i < 


{2k — 1) ^ of Theorem 3.1 does not hold for Qh^ is estimated by 


F{fi > {2k — 1) ) < 4n^ exp I — 


n 


A{2k - 1)2 

Requiring that the latter term is less than and solving for k gives ([^. 
5.3. Proof of Theorem 12.51 


□ 


Having established coherence results for Qh^ and Qh^ in Section 5.1, Theorem 2.5 follows from Theorems 


3.3 


and 3.4 of Tropp [M] as shown below. 


(a) Recall from (18) that the coherence for Qh^ satisfies /r = n Next, observe that unimodular 
implies that the columns of form n orthonormal bases, and, hence, n = wiGh^rwio = WGh^wh- 


Plugging this into condition (13) of Tropp’s theorem with <5 = 1/2 we require that 


ms = ,-./v2. 

n n 


Solving for s yields (11). Applying Theorem 3.4, which requires s > 1, shows that condition (13) in 
holds for ude that d\\2,2 robability at least 


Theorem 


3.3 


1 - (h/2)-h 

Now let 5 = II A)^Aa — Id\\ 2 ^ 2 - Then 


P(BP does not recover P from Th^) 

< P(BP does not recover F from Fh^ld < 1/2) +P(5 > 1/2). 


Thus by Theorem 3.4 we can lower bound the probability that recovery is successful by 


n 


l-{{k/2) " + 2n exp(- —)) 


Furthermore, observe that 2n^exp(—4|;) < e under condition (10) 
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(b) Let /i be the coherence associated with the random Gabor window h^. Setting = plogn in 
Theorem |5.l| we obtain that the probability that /r exceeds is smaller than 

4n(n — 1) exp(—a^/4) < . 


Set fj = p/4 — 2, i.e., p = 4(cj + 2), and assnme for the moment that p < ■ Then condition (13) with 

(5 = 1/2 of Theorem 3.4 is satisfied if 

^/I44;,/4{a + 2)lh5" + “ = e-'/V2. 

V n n 


Requiring s > 1 yields condition (22). Invoking Theorem 3.4 we obtain that ||A^Aa — Id\\ 2^2 < 1/2, 


A = with probability at least 1 — {k/2) ®. 

Similarly to the proof of part (a), we estimate the probability of successful recovery by 


R\ 


P(BP recovers F from Th 
> 1 — P(BP does not recover F from FIi'^|<5 < 1/2 &: //^ < 


+ F(6 > l/2\p^ < 


n 

2 . piog") ^ p(^2 > ^ 


n 


n 


By Theorem 3.3 the probability that F can be reconstructed from Th^ by Basis Pursuit ^ exceeds 


1 — (2n^ exp(—- 


n 


-) + (fc/ 2 )-^ + 4n-'^). 


8plog(n)A;' 

Finally, observe that the term 2n^ ^^P(~ pio^(n)fc ) than e provided 

k < 


n 


32(cj + 2) log(n) log(2n2/e) 


5.4. Proof of Corollary |2.6 

Plancherel’s theorem and M^Tph = TiM^-ph = aMn-pT^h with |(t| = 1 implies that the coherence remains 
the same under Fourier transform of the window, that is. 


fih= sup \{MiTph,Mi^Tpih)\ = sup \{MiTph, Me'Tp/h)\ 

,p') (£,p)^(r',p') 

= sup \{Mn-pTih,Mn-p'Tph)\ = 

(t,pme,p') 

Since all of the results concerning the dictionary of time-frequency shift matrices stated above are based on 
the coherence this proves the claim. 
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6. MULTIPLE TEST VECTORS 


In addition to the goal of recovering the operator T from the operator output caused by a single test signal, 
we may also consider using two or more test signals hi,... ,hr to identify T. In this case, the vector of 
concatenated observations Thi,... ,Thr is given as 




yT hj. j 


^ihi 




\ 


V' 


X = 




\^hrj 


X, 


^ihr ... 

and our sparse matrix identification task is again reduced to a sparse signal recovery problem. Although we 
will not pursue this task in depth here, we will make some remarks and state extensions of our results to 
this more general setting. 


Intuitively, using several test vectors instead of a single one should increase the maximal sparsity k that 
allows for perfect reconstruction as more information can be exploited. However, it is only interesting to 
consider r < m since any operator T G characterized by its action on m basis vectors. The 

following lemma on coherence of concatenated measurement matrices suggests that the maximal recoverable 
sparsity does not decrease. Its proof is straightforward and therefore omitted. 

Lemma 6.1. Let hi,... ,hr G C'” such that the matrices {^hj) have coherence Hj. Then the coherence [i 
of the normalized concatenated matrix 


^ {^hi) ^ 

{^h2) 




^ihi 


^ihr 


\ j 

satisfies ^ + ^2 + • • • + hr) < niaxj=i^.,,_,. hj- 


T U \ 


^Nhr y 


A straightforward extension of the proof of Theorem |5.1| yields the following result in the setting of 
time-frequency shifts and several randomly chosen h^, j = 1,... ,r. 

Theorem 6.2. Let n G N 6e even and choose random windows h^, j = 1,... ,r, with entries 


{^f)q = g = 0,...,n-l, 

\/n 
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where the €qj are independent and uniformly distributed on the torus {z G C,\z 
of the concatenated matrix 


( 


(g/if) 


\ 


1 


y/r 


\{QK)) 


where Q is defined in Then for a > 0 


1}. Let p be the coherence 


P(/i > —< 4n(n—l)e " (21) 

^rn 

Similarly as in Theorem |2.4[ b) we deduce that the condition 

k<^-! ~ 

4 Y 2 log n + log 4 + t 

implies that Basis Pursuit (or Orthogonal Matching Pursuit) recovers all fc-sparse F from Thf',... ,Thl^ 
with probability at least 1 — Hence, the maximal provable sparsity increases at least by a factor of ^/r. 


Of course, we may as well apply Tropp’s result based on random support sets and phases to arrive at a 


statement analogous to Theorem 2.5 


Theorem 6.3. Let n be even and k > 3 and let A be chosen uniformly at random among all sub¬ 
sets of {0, ...,n—1}^ of cardinality k. Suppose further that x G has support A with random phases 
(sgn(a:rp))(£^p)gA that are independent and uniformly distributed on the torus {z, \z\ = 1}. Let 


r = ^ XipMeTp. 

(r,p)gA 

Choose r independent random windows h ^,..., according to Q). Assume 

k < 

for some a > 0 and 


rn 


32(cj + 2) log n log(2n2/e) 


s ;= 


576(a 


J + 2) V n ) 


rn 


k\og{k/2 + 1 ) 


> 1 . 


Then with probability at least 

1 - (e + 4n-" + (A:/ 2 )-*) 

Basis Pursuit recovers F from Thf-,... ,Th^. 


( 22 ) 
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Figure 1. (a) Original 7-sparse coefficient vector (re = 59) in the time-frequency plane, (b) Reconstruction 
by Basis Pursuit using the Alltop window h^. (c) For comparison, the reconstruction by traditional £ 2 - 


minimization (23). 


Roughly speaking, with the chosen probabilistic model on the sparse coefficient vector x, the provable 
maximal sparsity k that allows for recovery, increases by a factor of r when taking r test vectors instead of 
only one. This fact is illustrated in Figure in Section 


7. NUMERICAL RESULTS 


Theorem 2.5 can be tested empirically for various values of re by trying a number of sparsity levels k and 
recording the fraction of times (j^ recovers the true /c-sparse coefficient vector x. 


But before doing so, we illustrate in Figure the recovery method for matrices which have a sparse 
representation in the dictionary of time-frequency shift matrices as considered in Theorem |2.5[ A 7-sparse 
coefficient vector x in the time-frequency plane is chosen and reconstructed from Vh^ = ^xipMiTph^ 
by Basis Pursuit. As comparison, x is reconstructed by a traditional reconstruction by £ 2 -™ffiimization, 


min ||a;||2 subject to = Th^ . 


(23) 


For the Alltop window in ([^ we consider the values of re prime from 11 to 59, for the random 

window in equation ^ we consider the values of re prime from 11 to 59 as well as re = 10 -|- 4j for 

2 

j = 0,1,..., 12. Each empirical test consists of generating a random /c-sparse x £ C” with non-zero entries 
Xq = Tq exp(27ri0q), with Xq drawn independently from the Gaussian N(0,1) distribution, and 9q drawn 
independently and uniformly from [0,1). 
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Fraction of successful recovery & Logistic regression 



Figure 2. Empirical verification of Theorem 2.5 without noise. For the random window with n = 30 
the mean response of (dash-dot) and fitted logistic regression model E{Y^), (solid), plotted against the 
fractional sparsity k/n. For the Alltop window with re = 43 the mean response of Y^ (dot) and htted 
logistic regression model E{Y^)^ (dash), plotted against the fractional sparsity k/n. 


For each value of re, 1000 tests are computed per value of A: = 1,2, ...,re—1. A test is considered 
successful if Basis Pursuit ([^ recovers all components of the coefficient vector x with 10“^^ error tolerance. 
The successful recovery of x, and, hence, of F from Th^ or Th^ is recorded in Y/} as a 1, and failure to 
recover as a 0. Following the empirical examination of phase transitions in |18j . we approximate the observed 
probability distribution by fitting the mean response of Y/} using the logistic regression model, m, 


E{Yn 


exp(/3o(re) + j3i{n)k) 

1 -L exp(/3o(n) -L I3i{n)k)' 


(24) 


For illustration purposes, the fitted response for windows with re = 43 and with re = 30 is shown 
in Figure]^ along with the mean response of Yj/'. 

The phase transition behaviors are often observed through the fractional sparsity ratio k/n, and the 
matrix so-called undersampling rate n/N, here 1/re for and |24] . Contours of the fitted logistic 
regression models for time-frequency shift dictionaries with identifiers and are shown in Figure [^(a) 
and (b) respectively. To facilitate a quantitative inspection of the contours in Figure and the theoretical 
results of [23] we overlay the contours in Figure with the level curve for 93% success rate (dash) and 
1/(2 log re) (solid). The curve 1/(2 log re) is known to be the threshold for overwhelming probability of 
successful recovery in the case of Gaussian random matrices for large re |23|. It is observed in Figure]^ that 
the curve 1/(2 log re) remains below the 93% success rate level curve, indicating consistence of the empirical 
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(a) 


(b) 


Figure 3. Empirical verification of Theorem 2.5 for (a) and (b) without noise. Contours of the fitted 
logistic regression model (gray), the 93% success rate contour (dashed), and 1/(2logn) (solid). Figure 
shows vertical slices for 1/43 (a) and 1/30 (b). 


results with the phase transition 1/(2 log n) conjectured for the class of time-frequency shift matrices applied 
to identihers and h^. Moreover, the curve 1/(2 logn) increasingly falls below the 93% success rate level 
curve as n increases, indicating improved agreement in the large n limit. Note that this conjectured phase 


transition l/(21ogn) is larger than that proven in the main Theorem 2.5, both in order (as u = 0 here), as 
well as in the constant. 


As stated earlier, in practice the measurements Th are observed with noise and although F can be well 
approximated by a fc-sparse representation, it is rarely strictly A:-sparse. For both of these reasons, the 
recovery algorithm Q is not often used in practice, rather Q is used to allow for an inexact fit of the 
measurements. 


In Figure]^ we empirically test Theorem 2.5 using (Q rather than Q for the reconstruction algorithm. 
We choose the same values of k and n, and the same number of tests were performed as for Figure]^ The 
non-zero entries in x are also selected from the same distribution as was used to generate Figure Additive 
noise is simulated at a level of 25 dB signal to noise ratio; that is, r/ is added to F/i with the entries in r) 
drawn independently from the Gaussian A^(0,1) and r] is normalized to ||t 7||2 = ||r/i ||2 • 10“^/^. 

Unlike the solution of for which the exact solution can be exactly A:-sparse, and for which numerical 
algorithms can compute approximations of arbitrary precision, the solution of Q from noisy measurements 
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(a) 


(b) 


Figure 4. Empirical verification of Theorem 2.5 for (a) and (b) in the noisy setting, with ^ 


replaced by and additive noise of 25 dB signal to noise ratio. Contours of the fitted logistic regression 
model (gray), the 93% success rate contour (dash), and 1/(2logn) (solid). 


will not recover the solution exactly. For our numerical experiments involving noisy measurements, the 
vector X associated with F resulting from the solution of Q is only considered to have been successfully 
recovered if the largest k entries of the recovered x' have the same support set A as £c. Alternative metrics 
of successful recovery, such as error or Signal to Noise Ratio (SNR), are less demanding than requiring 
a match of the support set; moreover, the support set metric was previously examined in this setting by 
Wainwright m and following this convention allows for a more direct comparison. The inequality fit 
parameter e in Q is selected to be at the noise level 


As in the noiseless setting, we approximate the probability distribution of the empirical observations T)/ 


using the logistic regression model (24). Contours of the fitted logistic regression models for time-frequency 
shift dictionaries with identifiers and are shown in Figure]^ (a) and (b) respectively. Overlaying these 
contours is the level curve for 93% success rate (dash) and 1/(2 logn) (solid). Unlike the noiseless case (§, 
it was shown that the threshold for overwhelming probability of successful recovery in the case of Gaussian 
random n x n? matrices with noise using 0 is l/(41ogre), [^ : however, we observe in Figure that 
1/(2 log n) fits the empirical data better in this instance. As Wainwright considered the Gaussian setting, 
this empirical observation for the Gabor system does not contradict results in [55], but the difference is 
noteworthy. 
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Figure 5. Empirical verification of Theorem 6.3 without noise. For the random windows with 

n = 30 the fraction of successful recovery based on Qh^ (dash-dot), Qh^ and Qh^ (solid), and 
and (dash) test vectors. 


In Figure we illustrate the performance of Basis Pursuit when using multiple test signals as discussed 
in Section in particular in Theorem |6.3[ Figure was obtained using the same procedure that provided 
Figure 
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